Extending Temporal Data Augmentation for Video Action Recognition
نویسندگان
چکیده
Pixel space augmentation has grown in popularity many Deep Learning areas, due to its effectiveness, simplicity, and low computational cost. Data for videos, however, still remains an under-explored research topic, as most works have been treating inputs stacks of static images rather than temporally linked series data. Recently, it shown that involving the time dimension when designing augmentations can be superior spatial-only variants video action recognition [34]. In this paper, we propose several novel enhancements these techniques strengthen relationship between spatial temporal domains achieve a deeper level perturbations. The results our outperform their respective Top-1 Top-5 settings on UCF-101 [55] HMDB-51 [38] datasets.
منابع مشابه
Combining Spatio-Temporal Appearance Descriptors and Optical Flow for Human Action Recognition in Video Data
This paper proposes combining spatio-temporal appearance (STA) descriptors with optical flow for human action recognition. The STA descriptors are local histogram-based descriptors of space-time, suitable for building a partial representation of arbitrary spatio-temporal phenomena. Because of the possibility of iterative refinement, they are interesting in the context of online human action rec...
متن کاملMulti-Task Zero-Shot Action Recognition with Prioritised Data Augmentation
Zero-Shot Learning (ZSL) promises to scale visual recognition by bypassing the conventional model training requirement of annotated examples for every category. This is achieved by establishing a mapping connecting low-level features and a semantic description of the label space, referred as visual-semantic mapping, on auxiliary data. Reusing the learned mapping to project target videos into an...
متن کاملCompressed Video Action Recognition
Training robust deep video representations has proven to be much more challenging than learning deep image representations and consequently hampered tasks like video action recognition. This is in part due to the enormous size of raw video streams, the associated amount of computation required, and the high temporal redundancy. The ‘true’ and interesting signal is often drowned in too much irre...
متن کاملAction recognition in video
Automatic action recognition in video has a broad array of applications, from surveillance to interactive video games. Classic algorithms usually use handcrafted descriptors such as SIFT (see [5]) or HOG (see [3]) to compute feature vectors of videos, and have achieved promising results in the past (see [7]). More recently, Quoc Le and Will Zou at the Stanford AI lab have proved that ISA featur...
متن کاملRecognition of Visual Events using Spatio-Temporal Information of the Video Signal
Recognition of visual events as a video analysis task has become popular in machine learning community. While the traditional approaches for detection of video events have been used for a long time, the recently evolved deep learning based methods have revolutionized this area. They have enabled event recognition systems to achieve detection rates which were not reachable by traditional approac...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2023
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-031-25825-1_8